The 2019-2020 Coronavirus Pandemic Analysis

Contact: Smith Research

BACKGROUND & APPROACH

I wanted to track and trend the coronavirus outbreak on my own curiosity. There are some interesting questions that may fall out of this, as it is a very historic moment, including scientifically and analytically (we have a large amount of data being shared across the globe, analyzed in real-time). The world has come to a halt because of it.
This analysis attempts to answer the following questions (more to come):

  1. What does the trend of the pandemic look like to date?
  2. What are future case predictions based on historical model?
  3. What interesting quirks or patterns emerge?

ASSUMPTIONS & LIMITATIONS: * This data is limited by the source. I realized early on that depending on source there were conflicting # of cases. Originally I was using JHU data… but this was always ‘ahead’ of the Our World In Data. I noticed that JHU’s website was buggy- you clicked on the U.S. stats but it didn’t reflect the U.S.. So I changed data sources to be more consistent with what is presented in the media (and Our World In Data has more extensive plots I can compare my own to). An interesting aside might be why the discrepancy? Was I missing something?
* Defintiions are important as is the idea that multiple varibales accumulate in things like total cases (more testing for example).

SOURCE RAW DATA: * https://ourworldindata.org/coronavirus
* https://github.com/CSSEGISandData/COVID-19/
*

INPUT DATA LOCATION: github (https://github.com/sbs87/coronavirus/tree/master/data)

OUTPUT DATA LOCATIOn: github (https://github.com/sbs87/coronavirus/tree/master/results)

TIMESTAMP

Start: ##—— Mon Apr 20 20:53:39 2020 ——##

PRE-ANALYSIS

The following sections are outside the scope of the ‘analysis’ but are still needed to prepare everything

UPSTREAM PROCESSING/ANALYSIS

  1. Google Mobility Scraping, script available at get_google_mobility.py
# Mobility data has to be extracted from Google PDF reports using a web scraping script (python , written by Peter Simone, https://github.com/petersim1/MIT_COVID19)

# See get_google_mobility.py for local script 

python3 get_google_mobility.py
# writes csv file of mobility data as "mobility.csv"

SET UP ENVIORNMENT

Load libraries and set global variables

# timestamp start
timestamp()
## ##------ Mon Apr 20 20:53:39 2020 ------##

# clear previous enviornment
rm(list = ls())

##------------------------------------------
## LIBRARIES
##------------------------------------------
library(plyr)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  3.0.0     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::arrange()   masks plyr::arrange()
## x purrr::compact()   masks plyr::compact()
## x dplyr::count()     masks plyr::count()
## x dplyr::failwith()  masks plyr::failwith()
## x dplyr::filter()    masks stats::filter()
## x dplyr::id()        masks plyr::id()
## x dplyr::lag()       masks stats::lag()
## x dplyr::mutate()    masks plyr::mutate()
## x dplyr::rename()    masks plyr::rename()
## x dplyr::summarise() masks plyr::summarise()
## x dplyr::summarize() masks plyr::summarize()
library(ggplot2)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(plot.utils)
library(utils)
library(knitr)

##------------------------------------------

##------------------------------------------
# GLOBAL VARIABLES
##------------------------------------------
user_name <- Sys.info()["user"]
working_dir <- paste0("/Users/", user_name, "/Projects/coronavirus/")  # don't forget trailing /
results_dir <- paste0(working_dir, "results/")  # assumes diretory exists
results_dir_custom <- paste0(results_dir, "custom/")  # assumes diretory exists


Corona_Cases.source_url <- "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
Corona_Cases.US.source_url <- "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv"
Corona_Deaths.US.source_url <- "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv"
Corona_Deaths.source_url <- "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"

Corona_Cases.fn <- paste0(working_dir, "data/", basename(Corona_Cases.source_url))
Corona_Cases.US.fn <- paste0(working_dir, "data/", basename(Corona_Cases.US.source_url))
Corona_Deaths.fn <- paste0(working_dir, "data/", basename(Corona_Deaths.source_url))
Corona_Deaths.US.fn <- paste0(working_dir, "data/", basename(Corona_Deaths.US.source_url))
default_theme <- theme_bw() + theme(text = element_text(size = 14))  # fix this
##------------------------------------------

FUNCTIONS

List of functions

function_name description
prediction_model outputs case estumate for given log-linear moder parameters slope and intercept
make_long converts input data to long format (specialized cases)
name_overlaps outputs the column names intersection and set diffs of two data frame
find_linear_index finds the first date at which linearaity occurs
##------------------------------------------
## FUNCTION: prediction_model
##------------------------------------------
## --- //// ----
# Takes days vs log10 (case) linear model parameters and a set of days since 100 cases and outputs a dataframe with total number of predicted cases for those days
## --- //// ----
prediction_model<-function(m=1,b=0,days=1){
  total_cases<-m*days+b
  total_cases.log<-log(total_cases,10)
  prediction<-data.frame(days=days,Total_confirmed_cases_perstate=total_cases)
  return(prediction)
}
##------------------------------------------

##------------------------------------------
## FUNCTION: make_long
##------------------------------------------
## --- //// ----
# Takes wide-format case data and converts into long format, using date and total cases as variable/values. Also enforces standardization/assumes data struture naming by using fixed variable name, value name, id.vars, 
## --- //// ----
make_long<-function(data_in,variable.name = "Date",
                   value.name = "Total_confirmed_cases",
                   id.vars=c("case_type","Province.State","Country.Region","Lat","Long","City","Population")){

long_data<-melt(data_in,
                id.vars = id.vars,
                variable.name=variable.name,
                value.name=value.name)
return(long_data)

}
##------------------------------------------

## THIS WILL BE IN UTILS AT SOME POINT
name_overlaps<-function(df1,df2){
i<-intersect(names(df1),
names(df2))
sd1<-setdiff(names(df1),
names(df2))
sd2<-setdiff(names(df2),names(df1))
cat("intersection:\n",paste(i,"\n"))
cat("in df1 but not df2:\n",paste(sd1,"\n"))
cat("in df2 but not df1:\n",paste(sd2,"\n"))
return(list("int"=i,"sd_1_2"=sd1,"sd_2_1"=sd2))
}

##------------------------------------------

##------------------------------------------
## FUNCTION: find_linear_index
##------------------------------------------
## --- //// ----
# Find date at which total case data is linear (for a given data frame) 
## --- //// ----

find_linear_index<-function(tmp,running_avg=5){
  tmp$Total_confirmed_cases_perstate.log<-log(tmp$Total_confirmed_cases_perstate,2)
  derivative<-data.frame(matrix(nrow = nrow(tmp),ncol = 4))
  names(derivative)<-c("m.time","mm.time","cumsum","date")
  
  # First derivative
  for(t in 2:nrow(tmp)){
    slope.t<- tmp[t,"Total_confirmed_cases_perstate.log"]- tmp[t-1,"Total_confirmed_cases_perstate.log"]
    derivative[t,"m.time"]<-slope.t
    derivative[t,"date"]<-as.Date(tmp[t,"Date"])
  }
  
  # Second derivative
  for(t in 2:nrow(derivative)){
    slope.t<- derivative[t,"m.time"]- derivative[t-1,"m.time"]
    derivative[t,"mm.time"]<-slope.t
  }
  
  #Compute running sum of second derivative (window = 5). Choose point at which within 0.2
  for(t in running_avg:nrow(derivative)){
    slope.t<- sum(abs(derivative[t:(t-4),"mm.time"])<0.2,na.rm = T)
    derivative[t,"cumsum"]<-slope.t
  }
  
  #Find date -5 from the stablility point
  linear_begin<-min(derivative[!is.na(derivative$cumsum) & derivative$cumsum==running_avg,"date"])-running_avg
  
  return(linear_begin)
}

READ IN DATA

# Q: do we want to archive previous versions? Maybe an auto git mv?

##------------------------------------------
## Download and read in latest data from github
##------------------------------------------
download.file(Corona_Cases.source_url, destfile = Corona_Cases.fn)
Corona_Totals.raw <- read.csv(Corona_Cases.fn, header = T, stringsAsFactors = F)

download.file(Corona_Cases.US.source_url, destfile = Corona_Cases.US.fn)
Corona_Totals.US.raw <- read.csv(Corona_Cases.US.fn, header = T, stringsAsFactors = F)

download.file(Corona_Deaths.source_url, destfile = Corona_Deaths.fn)
Corona_Deaths.raw <- read.csv(Corona_Deaths.fn, header = T, stringsAsFactors = F)

download.file(Corona_Deaths.US.source_url, destfile = Corona_Deaths.US.fn)
Corona_Deaths.US.raw <- read.csv(Corona_Deaths.US.fn, header = T, stringsAsFactors = F)

# latest date on all data:
paste("US deaths:", names(Corona_Deaths.US.raw)[ncol(Corona_Deaths.US.raw)])
## [1] "US deaths: X4.20.20"
paste("US total:", names(Corona_Totals.US.raw)[ncol(Corona_Totals.US.raw)])
## [1] "US total: X4.20.20"
paste("World deaths:", names(Corona_Deaths.raw)[ncol(Corona_Deaths.raw)])
## [1] "World deaths: X4.20.20"
paste("World total:", names(Corona_Totals.raw)[ncol(Corona_Totals.raw)])
## [1] "World total: X4.20.20"

PROCESS DATA

  • Convert to long format
  • Fix date formatting/convert to numeric date
  • Log10 transform total # cases
##------------------------------------------
## Combine death and total data frames
##------------------------------------------
Corona_Totals.raw$case_type<-"total"
Corona_Totals.US.raw$case_type<-"total"
Corona_Deaths.raw$case_type<-"death"
Corona_Deaths.US.raw$case_type<-"death"

# for some reason, Population listed in US death file but not for other data... Weird. When combining, all datasets will have this column, but US deaths is the only useful one.  
Corona_Totals.US.raw$Population<-"NA" 
Corona_Totals.raw$Population<-"NA"
Corona_Deaths.raw$Population<-"NA"

Corona_Cases.raw<-rbind(Corona_Totals.raw,Corona_Deaths.raw)
Corona_Cases.US.raw<-rbind(Corona_Totals.US.raw,Corona_Deaths.US.raw)
#TODO: custom utils- setdiff, intersect names... option to output in merging too
##------------------------------------------
# prepare raw datasets for eventual combining
##------------------------------------------
Corona_Cases.raw$City<-"NA" # US-level data has Cities
Corona_Cases.US.raw$Country_Region<-"US_state" # To differentiate from World-level stats

Corona_Cases.US.raw<-plyr::rename(Corona_Cases.US.raw,c("Province_State"="Province.State",
                                                  "Country_Region"="Country.Region",
                                                  "Long_"="Long",
                                                  "Admin2"="City"))


##------------------------------------------
## Convert to long format
##------------------------------------------
#JHU has a gross file format. It's in wide format with each column is the date in MM/DD/YY. So read this in as raw data but trasnform it to be better suited for analysis
# Furthermore, the World and US level data is formatted differently, containing different columns, etc. Recitfy this and combine the world-level stats with U.S. level stats.

Corona_Cases.long<-rbind(make_long(select(Corona_Cases.US.raw,-c(UID,iso2,iso3,code3,FIPS,Combined_Key))),
make_long(Corona_Cases.raw))


##------------------------------------------
## Fix date formatting, convert to numeric date
##------------------------------------------
Corona_Cases.long$Date<-gsub(Corona_Cases.long$Date,pattern = "^X",replacement = "0") # leading 0 read in as X
Corona_Cases.long$Date<-gsub(Corona_Cases.long$Date,pattern = "20$",replacement = "2020") # ends in .20 and not 2020
Corona_Cases.long$Date<-as.Date(Corona_Cases.long$Date,format = "%m.%d.%y")
Corona_Cases.long$Date.numeric<-as.numeric(Corona_Cases.long$Date)

kable(table(select(Corona_Cases.long,c("Country.Region","case_type"))),caption = "Number of death and total case longitudinal datapoints per geographical region")
Number of death and total case longitudinal datapoints per geographical region
death total
Afghanistan 90 90
Albania 90 90
Algeria 90 90
Andorra 90 90
Angola 90 90
Antigua and Barbuda 90 90
Argentina 90 90
Armenia 90 90
Australia 720 720
Austria 90 90
Azerbaijan 90 90
Bahamas 90 90
Bahrain 90 90
Bangladesh 90 90
Barbados 90 90
Belarus 90 90
Belgium 90 90
Belize 90 90
Benin 90 90
Bhutan 90 90
Bolivia 90 90
Bosnia and Herzegovina 90 90
Botswana 90 90
Brazil 90 90
Brunei 90 90
Bulgaria 90 90
Burkina Faso 90 90
Burma 90 90
Burundi 90 90
Cabo Verde 90 90
Cambodia 90 90
Cameroon 90 90
Canada 1350 1350
Central African Republic 90 90
Chad 90 90
Chile 90 90
China 2970 2970
Colombia 90 90
Congo (Brazzaville) 90 90
Congo (Kinshasa) 90 90
Costa Rica 90 90
Cote d’Ivoire 90 90
Croatia 90 90
Cuba 90 90
Cyprus 90 90
Czechia 90 90
Denmark 270 270
Diamond Princess 90 90
Djibouti 90 90
Dominica 90 90
Dominican Republic 90 90
Ecuador 90 90
Egypt 90 90
El Salvador 90 90
Equatorial Guinea 90 90
Eritrea 90 90
Estonia 90 90
Eswatini 90 90
Ethiopia 90 90
Fiji 90 90
Finland 90 90
France 990 990
Gabon 90 90
Gambia 90 90
Georgia 90 90
Germany 90 90
Ghana 90 90
Greece 90 90
Grenada 90 90
Guatemala 90 90
Guinea 90 90
Guinea-Bissau 90 90
Guyana 90 90
Haiti 90 90
Holy See 90 90
Honduras 90 90
Hungary 90 90
Iceland 90 90
India 90 90
Indonesia 90 90
Iran 90 90
Iraq 90 90
Ireland 90 90
Israel 90 90
Italy 90 90
Jamaica 90 90
Japan 90 90
Jordan 90 90
Kazakhstan 90 90
Kenya 90 90
Korea, South 90 90
Kosovo 90 90
Kuwait 90 90
Kyrgyzstan 90 90
Laos 90 90
Latvia 90 90
Lebanon 90 90
Liberia 90 90
Libya 90 90
Liechtenstein 90 90
Lithuania 90 90
Luxembourg 90 90
Madagascar 90 90
Malawi 90 90
Malaysia 90 90
Maldives 90 90
Mali 90 90
Malta 90 90
Mauritania 90 90
Mauritius 90 90
Mexico 90 90
Moldova 90 90
Monaco 90 90
Mongolia 90 90
Montenegro 90 90
Morocco 90 90
Mozambique 90 90
MS Zaandam 90 90
Namibia 90 90
Nepal 90 90
Netherlands 450 450
New Zealand 90 90
Nicaragua 90 90
Niger 90 90
Nigeria 90 90
North Macedonia 90 90
Norway 90 90
Oman 90 90
Pakistan 90 90
Panama 90 90
Papua New Guinea 90 90
Paraguay 90 90
Peru 90 90
Philippines 90 90
Poland 90 90
Portugal 90 90
Qatar 90 90
Romania 90 90
Russia 90 90
Rwanda 90 90
Saint Kitts and Nevis 90 90
Saint Lucia 90 90
Saint Vincent and the Grenadines 90 90
San Marino 90 90
Sao Tome and Principe 90 90
Saudi Arabia 90 90
Senegal 90 90
Serbia 90 90
Seychelles 90 90
Sierra Leone 90 90
Singapore 90 90
Slovakia 90 90
Slovenia 90 90
Somalia 90 90
South Africa 90 90
South Sudan 90 90
Spain 90 90
Sri Lanka 90 90
Sudan 90 90
Suriname 90 90
Sweden 90 90
Switzerland 90 90
Syria 90 90
Taiwan* 90 90
Tanzania 90 90
Thailand 90 90
Timor-Leste 90 90
Togo 90 90
Trinidad and Tobago 90 90
Tunisia 90 90
Turkey 90 90
Uganda 90 90
Ukraine 90 90
United Arab Emirates 90 90
United Kingdom 990 990
Uruguay 90 90
US 90 90
US_state 293490 293490
Uzbekistan 90 90
Venezuela 90 90
Vietnam 90 90
West Bank and Gaza 90 90
Western Sahara 90 90
Yemen 90 90
Zambia 90 90
Zimbabwe 90 90
# Decouple population and lat/long data, refactor to make it more tidy
metadata_columns<-c("Lat","Long","Population")
metadata<-unique(select(filter(Corona_Cases.long,case_type=="death"),c("Country.Region","Province.State","City",all_of(metadata_columns))))
Corona_Cases.long<-select(Corona_Cases.long,-all_of(metadata_columns))

# Some counties are not summarized on the country level. collapse all but US
Corona_Cases.long<-rbind.fill(ddply(filter(Corona_Cases.long,!Country.Region=="US_state"),c("case_type","Country.Region","Date","Date.numeric"),summarise,Total_confirmed_cases=sum(Total_confirmed_cases)),filter(Corona_Cases.long,Country.Region=="US_state"))

# Put total case and deaths side-by-side (wide)
Corona_Cases<-spread(Corona_Cases.long,key = case_type,value = Total_confirmed_cases)

#Compute moratlity rate
Corona_Cases$mortality_rate<-Corona_Cases$death/Corona_Cases$total

#TMP
Corona_Cases<-plyr::rename(Corona_Cases,c("total"="Total_confirmed_cases","death"="Total_confirmed_deaths"))

##------------------------------------------
## log10 transform total # cases
##------------------------------------------
Corona_Cases$Total_confirmed_cases.log<-log(Corona_Cases$Total_confirmed_cases,10)
Corona_Cases$Total_confirmed_deaths.log<-log(Corona_Cases$Total_confirmed_deaths,10)
##------------------------------------------
       
##------------------------------------------
## Compute # of days since 100th for US data
##------------------------------------------

# Find day that 100th case was found for Country/Province. NOTE: Non US countries may have weird provinces. For example, Frane is summairzed at the country level but also had 3 providences. I've only ensured the U.S. case100 works... so the case100_date for U.S. is summarized both for the entire country (regardless of state) and on a per-state level. 
# TODO: consider city-level summary as well. This data may be sparse

Corona_Cases<-merge(Corona_Cases,ddply(filter(Corona_Cases,Total_confirmed_cases>100),c("Country.Region"),summarise,case100_date=min(Date.numeric)))
Corona_Cases$Days_since_100<-Corona_Cases$Date.numeric-Corona_Cases$case100_date

##------------------------------------------
## Add population and lat/long data (CURRENTLY US ONLY)
##------------------------------------------

kable(filter(metadata,(is.na(Country.Region) | is.na(Population) )) %>% select(c("Country.Region","Province.State","City")) %>% unique(),caption = "Regions for which either population or Country is NA")
Regions for which either population or Country is NA
Country.Region Province.State City
# Drop missing data 
metadata<-filter(metadata,!(is.na(Country.Region) | is.na(Population) ))
# Convert remaining pop to numeric
metadata$Population<-as.numeric(metadata$Population)
## Warning: NAs introduced by coercion
# Add metadata to cases
Corona_Cases<-merge(Corona_Cases,metadata,all.x = T)

##------------------------------------------
## Compute total and death cases relative to population 
##------------------------------------------

Corona_Cases$Total_confirmed_cases.per100<-100*Corona_Cases$Total_confirmed_cases/Corona_Cases$Population
Corona_Cases$Total_confirmed_deaths.per100<-100*Corona_Cases$Total_confirmed_deaths/Corona_Cases$Population


##------------------------------------------
## Filter df for US state-wide stats
##------------------------------------------

Corona_Cases.US_state<-filter(Corona_Cases,Country.Region=="US_state" & Total_confirmed_cases>0 ) 
kable(table(select(Corona_Cases.US_state,c("Province.State"))),caption = "Number of longitudinal datapoints (total/death) per state")
Number of longitudinal datapoints (total/death) per state
Var1 Freq
Alabama 1854
Alaska 279
Arizona 536
Arkansas 1789
California 2140
Colorado 1730
Connecticut 313
Delaware 119
Diamond Princess 35
District of Columbia 36
Florida 2121
Georgia 4403
Grand Princess 36
Guam 36
Hawaii 192
Idaho 840
Illinois 2225
Indiana 2605
Iowa 2038
Kansas 1502
Kentucky 2368
Louisiana 1940
Maine 483
Maryland 809
Massachusetts 581
Michigan 2187
Minnesota 1866
Mississippi 2389
Missouri 2245
Montana 738
Nebraska 1005
Nevada 333
New Hampshire 356
New Jersey 841
New Mexico 683
New York 2003
North Carolina 2702
North Dakota 731
Northern Mariana Islands 21
Ohio 2404
Oklahoma 1630
Oregon 975
Pennsylvania 1988
Puerto Rico 36
Rhode Island 208
South Carolina 1443
South Dakota 971
Tennessee 2569
Texas 4750
Utah 525
Vermont 465
Virgin Islands 36
Virginia 3190
Washington 1396
West Virginia 1008
Wisconsin 1744
Wyoming 535
Corona_Cases.US_state<-merge(Corona_Cases.US_state,ddply(filter(Corona_Cases.US_state,Total_confirmed_cases>100),c("Province.State"),summarise,case100_date_state=min(Date.numeric)))
Corona_Cases.US_state$Days_since_100_state<-Corona_Cases.US_state$Date.numeric-Corona_Cases.US_state$case100_date_state

ANALYSIS

Q1: What is the trend in cases, mortality across geopgraphical regions?

Plot # of cases vs time
* For each geographical set:
* comparative longitudinal case trend (absolute & log scale)
* comparative longitudinal mortality trend
* death vs total correlation

question dataset x y color facet pch dimentions
comparative_longitudinal_case_trend long time log_cases geography none (case type?) case_type [15, 50, 4] geography x (2 scale?) case type
comparative longitudinal case trend long time cases geography case_type ? [15, 50, 4] geography x (2+ scale) case type
comparative longitudinal mortality trend wide time mortality rate geography none none [15, 50, 4] geography
death vs total correlation wide cases deaths geography none none [15, 50, 4] geography
# total cases vs time
# death cases vs time
# mortality rate vs time
# death vs mortality


  # death vs mortality
  # total & death case vs time (same plot)

#<question> <x> <y> <colored> <facet> <dataset>
## trend in case/deaths over time, comapred across regions <time> <log cases> <geography*> <none> <.wide>
## trend in case/deaths over time, comapred across regions <time> <cases> <geography*> <case_type> <.long>
## trend in mortality rate over time, comapred across regions <time> <mortality rate> <geography*> <none>
## how are death/mortality related/correlated? <time> <log cases> <geography*> <none>
## how are death and case load correlated? <cases> <deaths>

# lm for each?? - > apply lm from each region starting from 100th case. m, b associated with each.
    # input: geographical regsion, logcase vs day (100th case)
    # output: m, b for each geographical region ID



#total/death on same plot-  diffeer by 2 logs, so when plotting log, use pch. when plotting absolute, need to use free scales
#when plotting death and case on same, melt. 

#CoronaCases - > filter sets (3)
  #world - choose countries with sufficent data

N<-ddply(filter(Corona_Cases,Total_confirmed_cases>100),c("Country.Region"),summarise,n=length(Country.Region))
ggplot(filter(N,n<100),aes(x=n))+
  geom_histogram()+
  default_theme+
  ggtitle("Distribution of number of days with at least 100 confirmed cases for each region")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

kable(arrange(N,-n),caption="Sorted number of days with at least 100 confirmed cases")
Sorted number of days with at least 100 confirmed cases
Country.Region n
US_state 9348
China 90
Diamond Princess 71
Korea, South 61
Japan 60
Italy 58
Iran 55
Singapore 52
France 51
Germany 51
Spain 50
US 49
Switzerland 47
United Kingdom 47
Belgium 46
Netherlands 46
Norway 46
Sweden 46
Austria 44
Malaysia 43
Australia 42
Bahrain 42
Denmark 42
Canada 41
Qatar 41
Iceland 40
Brazil 39
Czechia 39
Finland 39
Greece 39
Iraq 39
Israel 39
Portugal 39
Slovenia 39
Egypt 38
Estonia 38
India 38
Ireland 38
Kuwait 38
Philippines 38
Poland 38
Romania 38
Saudi Arabia 38
Indonesia 37
Lebanon 37
San Marino 37
Thailand 37
Chile 36
Pakistan 36
Luxembourg 35
Peru 35
Russia 35
Ecuador 34
Slovakia 34
South Africa 34
United Arab Emirates 34
Armenia 33
Colombia 33
Croatia 33
Mexico 33
Panama 33
Serbia 33
Taiwan* 33
Turkey 33
Argentina 32
Bulgaria 32
Latvia 32
Algeria 31
Costa Rica 31
Dominican Republic 31
Hungary 31
Uruguay 31
Andorra 30
Bosnia and Herzegovina 30
Jordan 30
Lithuania 30
Morocco 30
New Zealand 30
North Macedonia 30
Vietnam 30
Albania 29
Cyprus 29
Malta 29
Moldova 29
Brunei 28
Burkina Faso 28
Sri Lanka 28
Tunisia 28
Ukraine 27
Azerbaijan 26
Ghana 26
Kazakhstan 26
Oman 26
Senegal 26
Venezuela 26
Afghanistan 25
Cote d’Ivoire 25
Cuba 24
Mauritius 24
Uzbekistan 24
Cambodia 23
Cameroon 23
Honduras 23
Nigeria 23
West Bank and Gaza 23
Belarus 22
Georgia 22
Bolivia 21
Kosovo 21
Kyrgyzstan 21
Montenegro 21
Congo (Kinshasa) 20
Kenya 19
Niger 18
Guinea 17
Rwanda 17
Trinidad and Tobago 17
Paraguay 16
Bangladesh 15
Djibouti 13
El Salvador 12
Guatemala 11
Madagascar 10
Mali 9
Congo (Brazzaville) 6
Jamaica 6
Gabon 4
Somalia 4
Tanzania 4
Ethiopia 3
Burma 2
Sudan 1
# Pick top 15 countries with data
max_colors<-12
# find way to fix this- China has diff provences. Plot doesnt look right...
sufficient_data<-arrange(filter(N,!Country.Region %in% c("US_state", "Diamond Princess")),-n)[1:max_colors,]
kable(sufficient_data,caption = paste0("Top ",max_colors," countries with sufficient data"))
Top 12 countries with sufficient data
Country.Region n
China 90
Korea, South 61
Japan 60
Italy 58
Iran 55
Singapore 52
France 51
Germany 51
Spain 50
US 49
Switzerland 47
United Kingdom 47
Corona_Cases.world<-filter(Corona_Cases,Country.Region %in% c(sufficient_data$Country.Region))


  #us 
  #    - by state
Corona_Cases.US<-filter(Corona_Cases,Country.Region=="US" & Total_confirmed_cases>0)
# summarize 
#!City %in% c("Unassigned") 
  #    - specific cities
#mortality_rate!=Inf & mortality_rate<=1
Corona_Cases.UScity<-filter(Corona_Cases,Province.State %in% c("Pennsylvania","Maryland","New York","New Jersey") & City %in% c("Bucks","Baltimore City", "New York","Burlington"))

measure_vars_long<-c("Total_confirmed_cases.log","Total_confirmed_cases","Total_confirmed_deaths","Total_confirmed_deaths.log")
melt_arg_list<-list(variable.name = "case_type",value.name = "cases",measure.vars = c("Total_confirmed_cases","Total_confirmed_deaths"))
melt_arg_list$data=NULL


melt_arg_list$data=select(Corona_Cases.world,-ends_with(match = "log"))
Corona_Cases.world.long<-do.call(melt,melt_arg_list)
melt_arg_list$data=select(Corona_Cases.UScity,-ends_with(match = "log"))
Corona_Cases.UScity.long<-do.call(melt,melt_arg_list)
melt_arg_list$data=select(Corona_Cases.US_state,-ends_with(match = "log"))
Corona_Cases.US_state.long<-do.call(melt,melt_arg_list)

Corona_Cases.world.long$cases.log<-log(Corona_Cases.world.long$cases,10)
Corona_Cases.US_state.long$cases.log<-log(Corona_Cases.US_state.long$cases,10)
Corona_Cases.UScity.long$cases.log<-log(Corona_Cases.UScity.long$cases,10)


# what is the current death and total case load for US? For world? For states?
#-absolute
#-log

# what is mortality rate (US, world)
#-absolute

#how is death and case correlated? (US, world)
#-absolute
#Corona_Cases.US<-filter(Corona_Cases,Country.Region=="US" & Total_confirmed_cases>0)
#Corona_Cases.US.case100<-filter(Corona_Cases.US, Days_since_100>=0)
# linear model parameters
#(model_fit<-lm(formula = Total_confirmed_cases.log~Days_since_100,data= Corona_Cases.US.case100 ))

#(slope<-model_fit$coefficients[2])
#(intercept<-model_fit$coefficients[1])

# Correlation coefficient
#cor(x = Corona_Cases.US.case100$Days_since_100,y = Corona_Cases.US.case100$Total_confirmed_cases.log)

##------------------------------------------
## Plot World Data
##------------------------------------------
# Timestamp for world
timestamp_plot.world<-paste("Most recent date for which data available:",max(Corona_Cases.world$Date))#timestamp(quiet = T,prefix = "Updated ",suffix = " (EST)")


# Base template for plots
baseplot.world<-ggplot(data=NULL,aes(x=Days_since_100,col=Country.Region))+
  default_theme+
  scale_color_brewer(type = "qualitative",palette = "Paired")+
  ggtitle(paste("Log10 cases over time,",timestamp_plot.world))+
  theme(legend.position = "bottom",plot.title = element_text(size=12))


##/////////////////////////
### Plot Longitudinal cases

(Corona_Cases.world.long.plot<-baseplot.world+
    geom_point(data=Corona_Cases.world.long,aes(y=cases))+
    geom_line(data=Corona_Cases.world.long,aes(y=cases))+
    facet_wrap(~case_type,scales = "free_y",ncol=1)+
    ggtitle(timestamp_plot.world)
    )

(Corona_Cases.world.loglong.plot<-baseplot.world+
    geom_point(data=Corona_Cases.world.long,aes(y=cases.log))+
    geom_line(data=Corona_Cases.world.long,aes(y=cases.log))+
    facet_wrap(~case_type,scales = "free_y",ncol=1)+
    ggtitle(timestamp_plot.world))

##/////////////////////////
### Plot Longitudinal mortality rate

(Corona_Cases.world.mortality.plot<-baseplot.world+
    geom_point(data=Corona_Cases.world,aes(y=mortality_rate))+
    geom_line(data=Corona_Cases.world,aes(y=mortality_rate))+
    ylim(c(0,0.3))+
    ggtitle(timestamp_plot.world))
## Warning: Removed 100 rows containing missing values (geom_point).
## Warning: Removed 100 row(s) containing missing values (geom_path).

##/////////////////////////
### Plot death vs total case correlation

(Corona_Cases.world.casecor.plot<-ggplot(Corona_Cases.world,aes(x=Total_confirmed_cases,y=Total_confirmed_deaths,col=Country.Region))+
  geom_point()+
  geom_line()+
  default_theme+
  scale_color_brewer(type = "qualitative",palette = "Paired")+
  ggtitle(paste("Log10 cases over time,",timestamp_plot.world))+
  theme(legend.position = "bottom",plot.title = element_text(size=12))+
    ggtitle(timestamp_plot.world))

### Write polots

write_plot(Corona_Cases.world.long.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/Corona_Cases.world.long.plot.png"
write_plot(Corona_Cases.world.loglong.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/Corona_Cases.world.loglong.plot.png"
write_plot(Corona_Cases.world.mortality.plot,wd = results_dir)
## Warning: Removed 100 rows containing missing values (geom_point).

## Warning: Removed 100 row(s) containing missing values (geom_path).
## [1] "/Users/stevensmith/Projects/coronavirus/results/Corona_Cases.world.mortality.plot.png"
write_plot(Corona_Cases.world.casecor.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/Corona_Cases.world.casecor.plot.png"
##------------------------------------------
## Plot US State Data
##-----------------------------------------

baseplot.US<-ggplot(data=NULL,aes(x=Days_since_100_state,col=case_type))+
  default_theme+
  facet_wrap(~Province.State)+
  ggtitle(paste("Log10 cases over time,",timestamp_plot.world))

Corona_Cases.US_state.long.plot<-baseplot.US+geom_point(data=Corona_Cases.US_state.long,aes(y=cases.log))
##------------------------------------------
## Plot US City Data
##-----------------------------------------

Corona_Cases.US.plotdata<-filter(Corona_Cases.US_state,Province.State %in% c("Pennsylvania","Maryland","New York","New Jersey") &
                                   City %in% c("Bucks","Baltimore City", "New York","Burlington") &
                                   Total_confirmed_cases>0) 
timestamp_plot<-paste("Most recent date for which data available:",max(Corona_Cases.US.plotdata$Date))#timestamp(quiet = T,prefix = "Updated ",suffix = " (EST)")

city_colors<-c("Bucks"='#beaed4',"Baltimore City"='#386cb0', "New York"='#7fc97f',"Burlington"='#fdc086')

##/////////////////////////
### Plot death vs total case correlation

(Corona_Cases.city.loglong.plot<-ggplot(melt(Corona_Cases.US.plotdata,measure.vars = c("Total_confirmed_cases.log","Total_confirmed_deaths.log"),variable.name = "case_type",value.name = "cases"),aes(x=Date,y=cases,col=City,pch=case_type))+
  geom_point(size=4)+
    geom_line()+
  default_theme+
  #facet_wrap(~case_type)+
    ggtitle(paste("Log10 total and death cases over time,",timestamp_plot))+
theme(legend.position = "bottom",plot.title = element_text(size=12))+
    scale_color_manual(values = city_colors))

(Corona_Cases.city.long.plot<-ggplot(filter(Corona_Cases.US.plotdata,Province.State !="New York"),aes(x=Date,y=Total_confirmed_cases,col=City))+
  geom_point(size=4)+
  geom_line()+
  default_theme+
  facet_grid(~Province.State,scales = "free_y")+
  ggtitle(paste("MD, PA, NJ total cases over time,",timestamp_plot))+
  theme(legend.position = "bottom",plot.title = element_text(size=12))+
  scale_color_manual(values = city_colors))

(Corona_Cases.city.mortality.plot<-ggplot(Corona_Cases.US.plotdata,aes(x=Date,y=mortality_rate,col=City))+
  geom_point(size=3)+
  geom_line(size=2)+
  default_theme+
  ggtitle(paste("Mortality rate (deaths/total) over time,",timestamp_plot))+
  theme(legend.position = "bottom",plot.title = element_text(size=12))+
  scale_color_manual(values = city_colors))

(Corona_Cases.city.casecor.plot<-ggplot(filter(Corona_Cases.US.plotdata,Province.State !="New York"),aes(x=Total_confirmed_deaths,y=Total_confirmed_cases,col=City))+
  geom_point(size=3)+
  geom_line(size=2)+
  default_theme+
  ggtitle(paste("Correlation of death vs total cases,",timestamp_plot))+
  theme(legend.position = "bottom",plot.title = element_text(size=12))+
  scale_color_manual(values = city_colors))

(Corona_Cases.city.long.normalized.plot<-ggplot(filter(Corona_Cases.US.plotdata,Province.State !="New York"),aes(x=Date,y=Total_confirmed_cases.per100,col=City))+
  geom_point(size=4)+
  geom_line()+
  default_theme+
  facet_grid(~Province.State)+
  ggtitle(paste("MD, PA, NJ total cases over time per 100 people,",timestamp_plot))+
  theme(legend.position = "bottom",plot.title = element_text(size=12))+
  scale_color_manual(values = city_colors))

write_plot(Corona_Cases.city.long.plot,wd = results_dir_custom)
## [1] "/Users/stevensmith/Projects/coronavirus/results/custom/Corona_Cases.city.long.plot.png"
write_plot(Corona_Cases.city.loglong.plot,wd = results_dir_custom)
## [1] "/Users/stevensmith/Projects/coronavirus/results/custom/Corona_Cases.city.loglong.plot.png"
write_plot(Corona_Cases.city.mortality.plot,wd = results_dir_custom)
## [1] "/Users/stevensmith/Projects/coronavirus/results/custom/Corona_Cases.city.mortality.plot.png"
write_plot(Corona_Cases.city.casecor.plot,wd = results_dir_custom)
## [1] "/Users/stevensmith/Projects/coronavirus/results/custom/Corona_Cases.city.casecor.plot.png"
write_plot(Corona_Cases.city.long.normalized.plot,wd = results_dir_custom)
## [1] "/Users/stevensmith/Projects/coronavirus/results/custom/Corona_Cases.city.long.normalized.plot.png"

Q1b what is the model

Fit the cases to a linear model 1. Find time at which the case vs date becomes linear in each plot
2. Fit linear model for each city

# What is the predict # of cases for the next few days?
# How is the model performing historically?

Corona_Cases.US_state.summary<-ddply(Corona_Cases.US_state,
                                     c("Province.State","Date"),
                                     summarise,
                                     Total_confirmed_cases_perstate=sum(Total_confirmed_cases)) %>% 
    filter(Total_confirmed_cases_perstate>100)

# Compute the states with the most cases (for coloring and for linear model)
top_states_totals<-head(ddply(Corona_Cases.US_state.summary,c("Province.State"),summarise, Total_confirmed_cases_perstate.max=max(Total_confirmed_cases_perstate)) %>% arrange(-Total_confirmed_cases_perstate.max),n=max_colors)

kable(top_states_totals,caption = "Top 12 States, total count ")
Top 12 States, total count
Province.State Total_confirmed_cases_perstate.max
New York 253060
New Jersey 88722
Massachusetts 38077
Pennsylvania 33914
California 33686
Michigan 32000
Illinois 31513
Florida 27059
Louisiana 24523
Connecticut 19815
Texas 19751
Georgia 19407
top_states<-top_states_totals$Province.State

# Manually fix states so that Maryland is switched out for New York
top_states_modified<-c(top_states[top_states !="New York"],"Maryland")

# Plot with all states:
(Corona_Cases.US_state.summary.plot<-ggplot(Corona_Cases.US_state.summary,aes(x=Date,y=Total_confirmed_cases_perstate))+
  geom_point()+
  geom_point(data=filter(Corona_Cases.US_state.summary,Province.State %in% top_states),aes(col=Province.State))+
  scale_color_brewer(type = "qualitative",palette = "Paired")+
  default_theme+
  theme(axis.text.x = element_text(angle=45,hjust=1),legend.position = "bottom")+
  ggtitle("Total confirmed cases per state, top 12 colored")+
  scale_x_date(date_breaks="1 week",date_minor_breaks="1 day"))

##------------------------------------------
## Fit linear model to time vs total cases
##-----------------------------------------

# First, find the date at which each state's cases vs time becomes lienar (2nd derivative is about 0)
li<-ddply(Corona_Cases.US_state.summary,c("Province.State"),find_linear_index)

# Compute linear model for each state starting at the point at which data becomes linear
for(i in 1:nrow(li)){
  Province.State.i<-li[i,"Province.State"]
  date.i<-li[i,"V1"]
  data.i<-filter(Corona_Cases.US_state.summary,Province.State==Province.State.i & as.numeric(Date) >= date.i)
  model_results<-lm(data.i,formula = Total_confirmed_cases_perstate~Date)
  slope<-model_results$coefficients[2]
  intercept<-model_results$coefficients[1]
  li[li$Province.State==Province.State.i,"m"]<-slope
  li[li$Province.State==Province.State.i,"b"]<-intercept
  }

# Compute top state case load with fitted model

(Corona_Cases.US_state.lm.plot<-ggplot(filter(Corona_Cases.US_state.summary,Province.State %in% top_states_modified ))+
    geom_abline(data=filter(li,Province.State %in% top_states_modified),
                aes(slope = m,intercept = b,col=Province.State),lty=2)+
    geom_point(aes(x=Date,y=Total_confirmed_cases_perstate,col=Province.State))+
    scale_color_brewer(type = "qualitative",palette = "Paired")+
    default_theme+
    theme(axis.text.x = element_text(angle=45,hjust=1),legend.position = "bottom")+
    ggtitle("Total confirmed cases per state, top 12 colored")+
    scale_x_date(date_breaks="1 week",date_minor_breaks="1 day"))

##------------------------------------------
## Predict the number of total cases over the next week
##-----------------------------------------

predicted_days<-c(0,1,2,3,7)+as.numeric(as.Date("2020-04-20"))

predicted_days_df<-data.frame(matrix(ncol=3))
names(predicted_days_df)<-c("Province.State","days","Total_confirmed_cases_perstate")

# USe model parameters to estiamte case loads
for(state.i in top_states_modified){
  predicted_days_df<-rbind(predicted_days_df,
                           data.frame(Province.State=state.i,
                                      prediction_model(m = li[li$Province.State==state.i,"m"],
                                                       b =li[li$Province.State==state.i,"b"] ,
                                                       days =predicted_days )))
  }

predicted_days_df$Date<-as.Date(predicted_days_df$days,origin="1970-01-01")

kable(predicted_days_df,caption = "Predicted total cases over the next week for selected states")
Predicted total cases over the next week for selected states
Province.State days Total_confirmed_cases_perstate Date
NA NA NA NA
New Jersey 18372 88643.21 2020-04-20
New Jersey 18373 92062.61 2020-04-21
New Jersey 18374 95482.01 2020-04-22
New Jersey 18375 98901.41 2020-04-23
New Jersey 18379 112579.01 2020-04-27
Massachusetts 18372 34300.92 2020-04-20
Massachusetts 18373 35507.08 2020-04-21
Massachusetts 18374 36713.24 2020-04-22
Massachusetts 18375 37919.40 2020-04-23
Massachusetts 18379 42744.05 2020-04-27
Pennsylvania 18372 32204.77 2020-04-20
Pennsylvania 18373 33431.39 2020-04-21
Pennsylvania 18374 34658.01 2020-04-22
Pennsylvania 18375 35884.62 2020-04-23
Pennsylvania 18379 40791.10 2020-04-27
California 18372 31015.97 2020-04-20
California 18373 32041.54 2020-04-21
California 18374 33067.11 2020-04-22
California 18375 34092.69 2020-04-23
California 18379 38194.98 2020-04-27
Michigan 18372 33714.44 2020-04-20
Michigan 18373 34937.87 2020-04-21
Michigan 18374 36161.30 2020-04-22
Michigan 18375 37384.73 2020-04-23
Michigan 18379 42278.44 2020-04-27
Illinois 18372 28503.01 2020-04-20
Illinois 18373 29504.79 2020-04-21
Illinois 18374 30506.57 2020-04-22
Illinois 18375 31508.34 2020-04-23
Illinois 18379 35515.46 2020-04-27
Florida 18372 26436.11 2020-04-20
Florida 18373 27341.82 2020-04-21
Florida 18374 28247.54 2020-04-22
Florida 18375 29153.25 2020-04-23
Florida 18379 32776.10 2020-04-27
Louisiana 18372 27020.92 2020-04-20
Louisiana 18373 28006.02 2020-04-21
Louisiana 18374 28991.13 2020-04-22
Louisiana 18375 29976.23 2020-04-23
Louisiana 18379 33916.65 2020-04-27
Connecticut 18372 18525.25 2020-04-20
Connecticut 18373 19295.74 2020-04-21
Connecticut 18374 20066.24 2020-04-22
Connecticut 18375 20836.74 2020-04-23
Connecticut 18379 23918.73 2020-04-27
Texas 18372 19177.39 2020-04-20
Texas 18373 19896.28 2020-04-21
Texas 18374 20615.16 2020-04-22
Texas 18375 21334.04 2020-04-23
Texas 18379 24209.57 2020-04-27
Georgia 18372 18076.99 2020-04-20
Georgia 18373 18740.31 2020-04-21
Georgia 18374 19403.63 2020-04-22
Georgia 18375 20066.95 2020-04-23
Georgia 18379 22720.23 2020-04-27
Maryland 18372 11876.61 2020-04-20
Maryland 18373 12330.74 2020-04-21
Maryland 18374 12784.87 2020-04-22
Maryland 18375 13239.00 2020-04-23
Maryland 18379 15055.53 2020-04-27
##------------------------------------------
## Write plots
##-----------------------------------------

write_plot(Corona_Cases.US_state.summary.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/Corona_Cases.US_state.summary.plot.png"
write_plot(Corona_Cases.US_state.lm.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/Corona_Cases.US_state.lm.plot.png"
##------------------------------------------
## Write tables
##-----------------------------------------

write.csv(predicted_days_df,file = paste0(results_dir,"predicted_total_cases_days.csv"),quote = F,row.names = F)

Q2: What is the predicted number of cases?

What is the prediction of COVID-19 based on model thus far? Additional questions:

WHy did it take to day 40 to start a log linear trend? How long will it be till x number of cases? When will the plateu happen? Are any effects noticed with social distancing? Delays

##------------------------------------------
## Prediction and Prediction Accuracy
##------------------------------------------


today_num<-max(Corona_Cases.US$Days_since_100)
predicted_days<-today_num+c(1,2,3,7)

#mods = dlply(mydf, .(x3), lm, formula = y ~ x1 + x2)
#today:
Corona_Cases.US[Corona_Cases.US$Days_since_100==(today_num-1),]
Corona_Cases.US[Corona_Cases.US$Days_since_100==today_num,]
Corona_Cases.US$type<-"Historical"


#prediction_values<-prediction_model(m=slope,b=intercept,days = predicted_days)$Total_confirmed_cases

#histoical_model<-data.frame(date=today_num,m=slope,b=intercept)

# model for previous y days
#historical_model_predictions<-data.frame(day_x=NULL,Days_since_100=NULL,Total_confirmed_cases=NULL,Total_confirmed_cases.log=NULL)
# for(i in c(1,2,3,4,5,6,7,8,9,10)){
#   #i<-1
# day_x<-today_num-i # 1, 2, 3, 4
# day_x_nextweek<-day_x+c(1,2,3)
# model_fit_x<-lm(data = filter(Corona_Cases.US.case100,Days_since_100 < day_x),formula = Total_confirmed_cases.log~Days_since_100)
# prediction_day_x_nextweek<-prediction_model(m = model_fit_x$coefficients[2],b = model_fit_x$coefficients[1],days = day_x_nextweek)
# prediction_day_x_nextweek$type<-"Predicted"
# acutal_day_x_nextweek<-filter(Corona_Cases.US,Days_since_100 %in% day_x_nextweek) %>% select(c(Days_since_100,Total_confirmed_cases,Total_confirmed_cases.log))
# acutal_day_x_nextweek$type<-"Historical"
# historical_model_predictions.i<-data.frame(day_x=day_x,rbind(acutal_day_x_nextweek,prediction_day_x_nextweek))
# historical_model_predictions<-rbind(historical_model_predictions.i,historical_model_predictions)
# }

#historical_model_predictions.withHx<-rbind.fill(historical_model_predictions,data.frame(Corona_Cases.US,type="Historical"))
#historical_model_predictions.withHx$Total_confirmed_cases.log2<-log(historical_model_predictions.withHx$Total_confirmed_cases,2)

(historical_model_predictions.plot<-ggplot(historical_model_predictions.withHx,aes(x=Days_since_100,y=Total_confirmed_cases.log,col=type))+
    geom_point(size=3)+
    default_theme+
    theme(legend.position = "bottom")+ 
      #geom_abline(slope = slope,intercept =intercept,lty=2)+
    #facet_wrap(~case_type,ncol=1)+
    scale_color_manual(values = c("Historical"="#377eb8","Predicted"="#e41a1c")))
write_plot(historical_model_predictions.plot,wd=results_dir)

Q3: What is the effect on social distancing, descreased mobility on case load?

Load data from Google which compoutes % change in user mobility relative to baseline for * Recreation
* Workplace
* Residence
* Park
* Grocery

Data from https://www.google.com/covid19/mobility/

# See pre-processing section for script on gathering mobility data

# UNDER DEVELOPMENT

mobility<-read.csv("/Users/stevensmith/Projects/MIT_COVID19/mobility.csv",header = T,stringsAsFactors = F)
#mobility$Retail_Recreation<-as.numeric(sub(mobility$Retail_Recreation,pattern = "%",replacement = ""))
#mobility$Workplace<-as.numeric(sub(mobility$Workplace,pattern = "%",replacement = ""))
#mobility$Residential<-as.numeric(sub(mobility$Residential,pattern = "%",replacement = ""))

##------------------------------------------
## Show relationship between mobility and caseload
##------------------------------------------
mobility$County<-gsub(mobility$County,pattern = " County",replacement = "")
Corona_Cases.US_state.mobility<-merge(Corona_Cases.US_state,plyr::rename(mobility,c("State"="Province.State","County"="City")))

#Corona_Cases.US_state.tmp<-merge(metadata,Corona_Cases.US_state.tmp)
# Needs to happen upsteam, see todos
#Corona_Cases.US_state.tmp$Total_confirmed_cases.perperson<-Corona_Cases.US_state.tmp$Total_confirmed_cases/as.numeric(Corona_Cases.US_state.tmp$Population)
mobility_measures<-c("Retail_Recreation","Grocery_Pharmacy","Parks","Transit","Workplace","Residential")

plot_data<-filter(Corona_Cases.US_state.mobility, Date.numeric==max(Corona_Cases.US_state$Date.numeric) ) %>% melt(measure.vars=mobility_measures) 
plot_data$value<-as.numeric(gsub(plot_data$value,pattern = "%",replacement = ""))
plot_data<-filter(plot_data,!is.na(value))

(mobility.plot<-ggplot(filter(plot_data,Province.State %in% c("Pennsylvania","Maryland","New Jersey","California","Delaware","Connecticut")),aes(y=Total_confirmed_cases.per100,x=value))+geom_point()+
  facet_grid(Province.State~variable,scales = "free")+
  xlab("Mobility change from baseline (%)")+
  ylab(paste0("Confirmed cases per 100 people(Today)"))+
  default_theme+
  ggtitle("Mobility change vs cases"))

(mobility.global.plot<-ggplot(plot_data,aes(y=Total_confirmed_cases.per100,x=value))+geom_point()+
  facet_wrap(~variable,scales = "free")+
  xlab("Mobility change from baseline (%)")+
  ylab(paste0("Confirmed cases (Today) per 100 people"))+
  default_theme+
  ggtitle("Mobility change vs cases"))

plot_data.permobility_summary<-ddply(plot_data,c("Province.State","variable"),summarise,cor=cor(y =Total_confirmed_cases.per100,x=value),median_change=median(x=value)) %>% arrange(-abs(cor))

kable(plot_data.permobility_summary,caption = "Ranked per-state mobility correlation with total confirmed cases")
Ranked per-state mobility correlation with total confirmed cases
Province.State variable cor median_change
Alaska Transit -1.0000000 -63.0
Delaware Retail_Recreation 1.0000000 -39.5
Delaware Grocery_Pharmacy 1.0000000 -17.5
Delaware Parks -1.0000000 20.5
Delaware Transit 1.0000000 -37.0
Delaware Workplace 1.0000000 -37.0
Delaware Residential -1.0000000 14.0
Hawaii Parks 0.9855500 -72.0
Hawaii Transit 0.9682915 -89.0
Alaska Residential 0.9646018 13.0
Utah Workplace -0.9461129 -37.0
Vermont Parks 0.9218804 -35.5
Hawaii Grocery_Pharmacy 0.9196704 -34.0
New Hampshire Parks 0.9113509 -20.0
South Dakota Parks 0.9065089 -26.0
Connecticut Grocery_Pharmacy -0.8993428 -6.0
Utah Retail_Recreation -0.8962660 -40.0
Hawaii Retail_Recreation 0.8606345 -56.0
Massachusetts Workplace -0.8533862 -39.0
Alaska Grocery_Pharmacy -0.8494172 -7.0
Utah Grocery_Pharmacy -0.8139552 -4.0
Connecticut Transit -0.7919371 -50.0
Rhode Island Workplace -0.7659157 -39.5
Utah Residential -0.7538373 12.0
North Dakota Parks -0.7395464 -34.0
New Mexico Parks 0.7200952 -31.5
Utah Transit -0.7198309 -18.0
New Jersey Workplace -0.7131262 -44.0
North Dakota Retail_Recreation -0.7054369 -43.5
Massachusetts Retail_Recreation -0.6899492 -44.0
California Retail_Recreation -0.6743051 -44.0
Utah Parks -0.6706987 17.0
California Workplace -0.6698821 -36.0
New York Workplace -0.6697008 -34.5
Vermont Grocery_Pharmacy -0.6642784 -25.0
Maine Transit -0.6634212 -50.0
New Jersey Retail_Recreation -0.6475270 -62.5
Kansas Parks 0.6423213 72.0
Connecticut Residential 0.6336191 14.0
Maryland Workplace -0.6306494 -35.0
New York Retail_Recreation -0.6285045 -46.0
Rhode Island Residential -0.6194063 18.5
California Residential 0.6123256 14.0
California Grocery_Pharmacy -0.6060930 -12.0
Montana Workplace -0.5901126 -40.5
Nevada Transit -0.5848795 -20.0
California Transit -0.5835718 -42.0
Massachusetts Grocery_Pharmacy -0.5814299 -7.0
Rhode Island Retail_Recreation -0.5704169 -45.0
Alaska Workplace -0.5679702 -34.0
Connecticut Workplace -0.5565283 -39.0
West Virginia Parks 0.5553745 -27.0
New Jersey Parks -0.5357858 -6.0
Montana Transit -0.5330404 -41.0
Maine Workplace -0.5306826 -30.0
Nevada Retail_Recreation -0.5168522 -43.0
Montana Retail_Recreation -0.5165435 -51.0
Idaho Workplace -0.5154350 -29.5
Minnesota Parks 0.4996525 -10.0
Connecticut Retail_Recreation -0.4823740 -45.0
Montana Parks -0.4797368 -58.0
Kansas Grocery_Pharmacy -0.4790038 -14.0
New Jersey Grocery_Pharmacy -0.4713537 2.5
Arizona Grocery_Pharmacy -0.4711334 -15.0
Maine Parks 0.4705601 -31.0
Montana Residential 0.4580929 14.0
Idaho Transit -0.4575232 -30.0
Rhode Island Parks 0.4514869 52.0
Vermont Residential 0.4431676 11.5
Pennsylvania Workplace -0.4418295 -36.0
Arkansas Parks -0.4362584 -12.0
Massachusetts Transit -0.4334904 -45.0
New Mexico Residential 0.4308095 13.5
New Jersey Transit -0.4283798 -50.5
Idaho Grocery_Pharmacy -0.4261301 -4.0
New York Parks 0.4257291 20.0
New York Transit -0.4242627 -48.0
Montana Grocery_Pharmacy -0.4058904 -16.0
Pennsylvania Retail_Recreation -0.3963362 -45.0
Michigan Workplace -0.3963036 -40.0
Virginia Retail_Recreation -0.3915238 -35.0
Colorado Residential 0.3868972 14.0
Vermont Retail_Recreation 0.3809664 -57.0
Illinois Transit -0.3809151 -31.0
Idaho Retail_Recreation -0.3791350 -41.0
Florida Parks -0.3772689 -43.0
Maryland Grocery_Pharmacy -0.3742974 -10.0
New Mexico Grocery_Pharmacy -0.3726445 -11.5
Colorado Workplace -0.3679019 -39.0
Virginia Transit -0.3677731 -33.0
Alabama Workplace -0.3677454 -29.0
Wisconsin Transit -0.3662530 -23.5
Arizona Transit 0.3587083 -38.0
Alaska Retail_Recreation 0.3539900 -39.0
New Mexico Retail_Recreation -0.3477916 -42.5
Oregon Parks 0.3463564 16.5
North Dakota Grocery_Pharmacy -0.3436992 -9.5
Arizona Residential 0.3405418 13.0
Maryland Retail_Recreation -0.3353905 -39.0
Florida Residential 0.3343833 14.0
Minnesota Transit -0.3334180 -28.5
Rhode Island Grocery_Pharmacy 0.3330692 -7.5
Colorado Retail_Recreation -0.3311956 -44.0
Colorado Parks -0.3266929 2.0
California Parks -0.3255591 -38.0
Arizona Retail_Recreation -0.3228722 -42.5
South Dakota Transit -0.3211172 -40.0
Washington Transit -0.3177385 -33.5
Mississippi Parks 0.3162382 -25.0
Arkansas Retail_Recreation -0.3157899 -30.0
Texas Transit 0.3112692 -42.0
North Dakota Workplace 0.3111708 -33.5
Idaho Parks 0.3084098 -22.0
Virginia Workplace -0.3054002 -32.0
Florida Transit -0.3048705 -49.0
Colorado Grocery_Pharmacy -0.3045091 -17.0
Illinois Workplace -0.2985202 -30.0
Mississippi Grocery_Pharmacy -0.2968257 -8.0
New York Grocery_Pharmacy -0.2931932 8.0
Virginia Grocery_Pharmacy -0.2866612 -8.0
Pennsylvania Parks 0.2837927 13.0
New Hampshire Residential -0.2819528 14.0
Colorado Transit -0.2809509 -36.0
Maine Grocery_Pharmacy -0.2802607 -13.0
New Jersey Residential 0.2782457 18.0
Oregon Residential 0.2721056 10.5
Kentucky Parks 0.2715847 28.5
Arkansas Residential 0.2657737 12.0
Texas Residential -0.2557815 15.0
Georgia Grocery_Pharmacy -0.2552937 -10.0
Florida Workplace -0.2524014 -33.0
Rhode Island Transit -0.2516145 -56.0
Hawaii Residential -0.2498230 19.0
New Hampshire Grocery_Pharmacy -0.2455163 -6.0
Iowa Workplace -0.2453980 -29.0
Massachusetts Residential 0.2439284 15.0
Tennessee Retail_Recreation -0.2408655 -30.0
Indiana Grocery_Pharmacy -0.2396749 -5.5
Maryland Residential 0.2375251 15.0
Michigan Grocery_Pharmacy -0.2374533 -11.0
Nebraska Grocery_Pharmacy -0.2368763 0.0
Maine Retail_Recreation -0.2351854 -42.0
West Virginia Grocery_Pharmacy -0.2321920 -6.0
Texas Parks 0.2321367 -42.0
Kansas Retail_Recreation -0.2294313 -39.0
North Dakota Residential 0.2292251 17.0
Pennsylvania Grocery_Pharmacy -0.2271322 -6.0
Oklahoma Grocery_Pharmacy 0.2181002 -0.5
South Carolina Residential 0.2177518 12.0
Georgia Retail_Recreation -0.2171438 -41.0
Alabama Residential 0.2153180 11.0
Wisconsin Parks 0.2143328 51.5
Virginia Residential 0.2088566 14.0
Georgia Workplace -0.2064842 -33.5
Michigan Retail_Recreation -0.2062983 -53.0
Washington Workplace -0.2054725 -38.0
Alabama Grocery_Pharmacy -0.2053822 -2.0
Alabama Transit -0.2042660 -36.5
West Virginia Retail_Recreation 0.2024442 -38.5
Iowa Residential -0.2001295 13.0
Washington Parks 0.1982925 -3.5
Nevada Residential 0.1979938 17.0
Oklahoma Residential 0.1973111 15.0
South Dakota Retail_Recreation -0.1962572 -38.5
South Carolina Workplace 0.1954623 -30.0
Ohio Transit 0.1946730 -28.0
Alabama Parks 0.1944637 -1.0
North Carolina Retail_Recreation -0.1883056 -33.0
Michigan Parks 0.1872264 30.0
Tennessee Grocery_Pharmacy -0.1865674 6.0
Kentucky Workplace -0.1863435 -35.0
Tennessee Residential 0.1854715 11.5
West Virginia Workplace 0.1850232 -32.5
Nebraska Residential -0.1834972 14.0
Wisconsin Workplace -0.1806554 -31.0
New Hampshire Transit -0.1800090 -57.0
Arizona Workplace -0.1790383 -35.0
Illinois Residential 0.1777245 14.0
Texas Workplace 0.1725234 -31.0
Oklahoma Workplace -0.1709318 -30.5
South Carolina Parks -0.1691941 -23.0
Arkansas Workplace -0.1686989 -26.0
New Hampshire Retail_Recreation -0.1674223 -41.0
Nevada Workplace -0.1662423 -40.0
South Carolina Retail_Recreation -0.1660328 -35.0
Missouri Transit -0.1630578 -23.0
North Carolina Transit 0.1588124 -32.0
Oklahoma Retail_Recreation 0.1583250 -31.0
South Dakota Grocery_Pharmacy 0.1553864 -9.0
Indiana Retail_Recreation -0.1550508 -38.0
Oregon Grocery_Pharmacy 0.1532142 -7.0
Pennsylvania Transit -0.1519722 -41.5
Florida Grocery_Pharmacy -0.1514709 -14.0
Tennessee Workplace -0.1483637 -31.0
Wisconsin Residential -0.1478320 14.0
Wisconsin Grocery_Pharmacy 0.1428404 -1.5
Maine Residential -0.1386968 11.0
Georgia Residential -0.1365163 13.0
Oklahoma Parks -0.1362105 -18.5
Idaho Residential -0.1345555 11.0
Nebraska Transit 0.1327007 -11.5
Illinois Retail_Recreation -0.1289100 -40.0
Arizona Parks 0.1246325 -44.5
Nebraska Workplace 0.1243324 -32.5
Florida Retail_Recreation -0.1240353 -43.0
Pennsylvania Residential 0.1237324 15.0
Minnesota Retail_Recreation 0.1223654 -40.5
Kentucky Residential 0.1187336 12.0
Ohio Residential 0.1181871 14.0
North Carolina Parks -0.1168981 7.0
Vermont Workplace -0.1127410 -43.0
Tennessee Parks 0.1087016 10.5
Ohio Retail_Recreation 0.1038262 -36.0
Virginia Parks 0.1037221 6.0
Mississippi Workplace -0.1028413 -33.0
Washington Residential 0.0939541 13.0
Illinois Grocery_Pharmacy -0.0935639 2.0
Wisconsin Retail_Recreation -0.0923467 -44.0
Washington Retail_Recreation -0.0912974 -42.0
Kentucky Transit 0.0891570 -31.0
New Mexico Transit 0.0881449 -37.0
Michigan Residential 0.0878612 15.0
Indiana Workplace -0.0875347 -34.0
Illinois Parks 0.0875275 26.5
Maryland Parks 0.0864502 27.0
New Mexico Workplace -0.0864431 -34.0
Hawaii Workplace -0.0808873 -46.0
New York Residential 0.0797617 17.5
Connecticut Parks 0.0780428 43.0
Arkansas Transit 0.0735323 -27.0
Nebraska Parks 0.0726725 55.5
Oregon Transit -0.0723629 -28.0
North Carolina Grocery_Pharmacy 0.0716555 1.0
Nevada Parks -0.0700445 -12.5
Iowa Parks -0.0698225 28.5
New Hampshire Workplace -0.0665849 -37.0
Ohio Grocery_Pharmacy 0.0643872 0.0
Indiana Residential 0.0618211 12.0
Kansas Residential -0.0603836 13.0
Nebraska Retail_Recreation 0.0587788 -37.5
Missouri Grocery_Pharmacy -0.0580754 2.0
Missouri Retail_Recreation -0.0575486 -36.5
Texas Retail_Recreation -0.0556309 -39.0
Kansas Transit -0.0519103 -26.5
Minnesota Workplace -0.0517544 -33.0
Kansas Workplace -0.0482002 -31.5
North Carolina Residential 0.0478427 13.0
Iowa Retail_Recreation -0.0469994 -37.0
Indiana Parks -0.0469890 29.0
Maryland Transit -0.0467283 -39.0
South Carolina Transit -0.0457601 -45.0
Missouri Workplace 0.0443428 -28.5
Georgia Transit -0.0427605 -35.0
Michigan Transit 0.0427330 -46.0
Mississippi Retail_Recreation 0.0427132 -40.0
Massachusetts Parks -0.0404740 39.0
Washington Grocery_Pharmacy -0.0398682 -7.0
Oregon Workplace -0.0395589 -32.0
Iowa Grocery_Pharmacy -0.0388311 4.0
Tennessee Transit 0.0383680 -32.0
Nevada Grocery_Pharmacy -0.0381444 -11.0
South Carolina Grocery_Pharmacy -0.0340216 1.0
Minnesota Grocery_Pharmacy -0.0336416 -5.0
West Virginia Transit 0.0329966 -45.0
Missouri Parks 0.0289776 0.0
Kentucky Retail_Recreation 0.0274012 -29.0
Georgia Parks -0.0273686 -6.0
West Virginia Residential 0.0262065 11.0
Mississippi Residential 0.0259534 13.0
North Dakota Transit -0.0248354 -48.0
South Dakota Residential 0.0243392 15.0
Minnesota Residential 0.0238146 17.0
North Carolina Workplace 0.0231928 -31.0
Vermont Transit 0.0227965 -63.0
Mississippi Transit -0.0214496 -38.5
Alabama Retail_Recreation 0.0213752 -39.0
Ohio Workplace -0.0192275 -35.0
Iowa Transit 0.0173974 -25.0
Oklahoma Transit 0.0172117 -26.0
Arkansas Grocery_Pharmacy -0.0170389 3.5
Kentucky Grocery_Pharmacy 0.0162747 4.0
Indiana Transit -0.0145231 -29.0
Texas Grocery_Pharmacy -0.0144226 -13.0
Ohio Parks 0.0135843 67.5
South Dakota Workplace 0.0100289 -35.0
Missouri Residential 0.0081337 13.0
Oregon Retail_Recreation 0.0075100 -41.0
Alaska Parks NA 29.0
District of Columbia Retail_Recreation NA -69.0
District of Columbia Grocery_Pharmacy NA -28.0
District of Columbia Parks NA -65.0
District of Columbia Transit NA -69.0
District of Columbia Workplace NA -48.0
District of Columbia Residential NA 17.0
# sanity check
ggplot(filter(plot_data,Province.State %in% c("Pennsylvania","Maryland","New Jersey","California","Delaware","Connecticut")),aes(x=Total_confirmed_cases.per100,fill=variable))+geom_histogram()+
  facet_grid(~Province.State)+
    default_theme+
  theme(legend.position = "bottom")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

write_plot(mobility.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/mobility.plot.png"
write_plot(mobility.global.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/mobility.global.plot.png"
(plot_data.permobility_summary.plot<-ggplot(plot_data.permobility_summary,aes(x=variable,y=median_change))+
  geom_jitter(size=2,width=.2)+
  #geom_jitter(data=plot_data.permobility_summary %>% arrange(-abs(median_change)) %>% head(n=15),aes(col=Province.State),size=2,width=.2)+
  default_theme+
  ggtitle("Per-Sate Median Change in Mobility")+
  xlab("Mobility Meaure")+
  ylab("Median Change from Baseline"))

write_plot(plot_data.permobility_summary.plot,wd = results_dir)
## [1] "/Users/stevensmith/Projects/coronavirus/results/plot_data.permobility_summary.plot.png"

DELIVERABLE MANIFEST

The following link to commited documents pushed to github. These are provided as a convienence, but note this is a manual process. The generation of reports, plots and tables is not coupled to the execution of this markdown. ## Report This report, html & pdf

Plots

github_root<-"https://github.com/sbs87/coronavirus/blob/master/"

plot_handle<-c("Corona_Cases.world.long.plot",
               "Corona_Cases.world.loglong.plot",
               "Corona_Cases.world.mortality.plot",
               "Corona_Cases.world.casecor.plot",
               "Corona_Cases.city.long.plot",
               "Corona_Cases.city.loglong.plot",
               "Corona_Cases.city.mortality.plot",
               "Corona_Cases.city.casecor.plot",
               "Corona_Cases.city.long.normalized.plot",
               "Corona_Cases.US_state.lm.plot",
               "Corona_Cases.US_state.summary.plot")


deliverable_manifest<-data.frame(
  name=c("World total & death cases, longitudinal",
         "World log total & death cases, longitudinal",
         "World mortality",
         "World total & death cases, correlation",
         "City total & death cases, longitudinal",
         "City log total & death cases, longitudinal",
         "City mortality",
         "City total & death cases, correlation",
         "City population normalized total & death cases, longitudinal",
         "State total cases (select) with linear model, longitudinal",
         "State total cases, longitudinal"),
  plot_handle=plot_handle,
  link=paste0(github_root,"results/",plot_handle,".png")
)


(tmp<-data.frame(row_out=apply(deliverable_manifest,MARGIN = 1,FUN = function(x) paste(x[1],x[2],x[3],sep=" | "))))
##                                                                                                                                                                                                        row_out
## 1                                           World total & death cases, longitudinal | Corona_Cases.world.long.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.long.plot.png
## 2                                 World log total & death cases, longitudinal | Corona_Cases.world.loglong.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.loglong.plot.png
## 3                                                         World mortality | Corona_Cases.world.mortality.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.mortality.plot.png
## 4                                      World total & death cases, correlation | Corona_Cases.world.casecor.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.casecor.plot.png
## 5                                              City total & death cases, longitudinal | Corona_Cases.city.long.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.long.plot.png
## 6                                    City log total & death cases, longitudinal | Corona_Cases.city.loglong.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.loglong.plot.png
## 7                                                            City mortality | Corona_Cases.city.mortality.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.mortality.plot.png
## 8                                         City total & death cases, correlation | Corona_Cases.city.casecor.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.casecor.plot.png
## 9  City population normalized total & death cases, longitudinal | Corona_Cases.city.long.normalized.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.long.normalized.plot.png
## 10                     State total cases (select) with linear model, longitudinal | Corona_Cases.US_state.lm.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.US_state.lm.plot.png
## 11                                      State total cases, longitudinal | Corona_Cases.US_state.summary.plot | https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.US_state.summary.plot.png
row_out<-apply(tmp, 2, paste, collapse="\t\n")
name handle link
World total & death cases, longitudinal Corona_Cases.world.long.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.long.plot.png
World log total & death cases, longitudinal Corona_Cases.world.loglong.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.loglong.plot.png
World mortality Corona_Cases.world.mortality.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.mortality.plot.png
World total & death cases, correlation Corona_Cases.world.casecor.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.world.casecor.plot.png
City total & death cases, longitudinal Corona_Cases.city.long.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.long.plot.png
City log total & death cases, longitudinal Corona_Cases.city.loglong.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.loglong.plot.png
City mortality Corona_Cases.city.mortality.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.mortality.plot.png
City total & death cases, correlation Corona_Cases.city.casecor.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.casecor.plot.png
City population normalized total & death cases, longitudinal Corona_Cases.city.long.normalized.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.city.long.normalized.plot.png
State total cases (select) with linear model, longitudinal Corona_Cases.US_state.lm.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.US_state.lm.plot.png
State total cases, longitudinal Corona_Cases.US_state.summary.plot https://github.com/sbs87/coronavirus/blob/master/results/Corona_Cases.US_state.summary.plot.png

Tables

CONCLUSION

Overall, the trends of COVID-19 cases is no longer in log-linear phase for world or U.S. (but some regions like MD are still in the log-linear phase). Mortality rate (deaths/confirmed RNA-based cases) is >1%, with a range depending on region. Mobility is not a strong indicator of caseload (U.S. data).

See table below for detailed breakdown.

Question Answer
What is the effect on social distancing, descreased mobility on case load?
There is not a strong apparent effect on decreased mobility (work, grocery, retail) or increased mobility (at residence, parks) on number of confirmed cases, either as a country (U.S.) or state level. California appears to have one of the best correlations, but this is a mixed bag
What is the trend in cases, mortality across geopgraphical regions?
The confirmed total casees and mortality is overall log-linear for most countries, with a trailing off beginning for most (inlcuding U.S.). On the state level, NY, NJ, PA starting to trail off; MD is still in log-linear phase. Mortality and case load are highly correlated for NY, NJ, PA, MD. The mortality rate flucutates for a given region, but is about 3% overall.

END

End: ##—— Mon Apr 20 20:54:50 2020 ——##

Cheatsheet: http://rmarkdown.rstudio.com>

Sandbox

# Geographical heatmap!
install.packages("maps")
library(maps)
library
mi_counties <- map_data("county", "pennsylvania") %>% 
  select(lon = long, lat, group, id = subregion)
head(mi_counties)

ggplot(mi_counties, aes(lon, lat)) + 
  geom_point(size = .25, show.legend = FALSE) +
  coord_quickmap()
mi_counties$cases<-1:2226
name_overlaps(metadata,Corona_Cases.US_state)

tmp<-merge(Corona_Cases.US_state,metadata)
ggplot(filter(tmp,Province.State=="Pennsylvania"), aes(Long, Lat, group = as.factor(City))) +
  geom_polygon(aes(fill = Total_confirmed_cases), colour = "grey50") + 
  coord_quickmap()


ggplot(Corona_Cases.US_state, aes(Long, Lat))+
  geom_polygon(aes(fill = Total_confirmed_cases ), color = "white")+
  scale_fill_viridis_c(option = "C")
dev.off()


require(maps)
require(viridis)

world_map <- map_data("world")
ggplot(world_map, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill="lightgray", colour = "white")

head(world_map)
head(Corona_Cases.US_state)
unique(select(world_map,c("region","group"))) %>% filter()

some.eu.countries <- c(
  "US"
)
# Retrievethe map data
some.eu.maps <- map_data("world", region = some.eu.countries)

# Compute the centroid as the mean longitude and lattitude
# Used as label coordinate for country's names
region.lab.data <- some.eu.maps %>%
  group_by(region) %>%
  summarise(long = mean(long), lat = mean(lat))

unique(filter(some.eu.maps,subregion %in% Corona_Cases.US_state$Province.State) %>% select(subregion))
unique(Corona_Cases.US_state$Total_confirmed_cases.log)
ggplot(filter(Corona_Cases.US_state,Date=="2020-04-17") aes(x = Long, y = Lat)) +
  geom_polygon(aes( fill = Total_confirmed_cases.log))+
  #geom_text(aes(label = region), data = region.lab.data,  size = 3, hjust = 0.5)+
  #scale_fill_viridis_d()+
  #theme_void()+
  theme(legend.position = "none")
library("sf")
library("rnaturalearth")
library("rnaturalearthdata")

world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)
ggplot(data = world) +
    geom_sf()

counties <- st_as_sf(map("county", plot = FALSE, fill = TRUE))
counties <- subset(counties, grepl("florida", counties$ID))
counties$area <- as.numeric(st_area(counties))
#install.packages("lwgeom")
class(counties)
head(counties)
ggplot(data = world) +
    geom_sf(data=Corona_Cases.US_state) +
    #geom_sf(data = counties, aes(fill = area)) +
  geom_sf(data = counties, aes(fill = area)) +
   # scale_fill_viridis_c(trans = "sqrt", alpha = .4) +
    coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE)


head(counties)
tmp<-unique(select(filter(Corona_Cases.US_state,Date=="2020-04-17"),c(Lat,Long,Total_confirmed_cases.per100)))
st_as_sf(map("county", plot = FALSE, fill = TRUE))

join::inner_join.sf(Corona_Cases.US_state, counties)

library(sf)
library(sp)

nc <- st_read(system.file("shape/nc.shp", package="sf"))
class(nc)


spdf <- SpatialPointsDataFrame(coords = select(Corona_Cases.US_state,c("Lat","Long")), data = Corona_Cases.US_state,
                               proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))

head(spdf)
class(spdf)
st_cast(spdf)
id

https://stevenbsmith.net